Improved Temporal Difference Methods with Linear Function Approximation

نویسندگان

  • DIMITRI P. BERTSEKAS
  • ANGELIA NEDICH
چکیده

Editor’s Summary: This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 12 and 6. The advantage of the method presented here is that this is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Suttonfls TD(λ) and with various versions of least-squares that are based on value-iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan and Bradtke and Barto.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step methods often yield substantially better performance than their single-step counter-parts, due to a lower bias of the update targets. For non-linear function app...

متن کامل

Temporal Difference Approach to Playing Give-Away Checkers

In this paper we examine the application of temporal difference methods in learning a linear state value function approximation in a game of give-away checkers. Empirical results show that the TD(λ) algorithm can be successfully used to improve playing policy quality in this domain. Training games with strong and random opponents were considered. Results show that learning only on negative game...

متن کامل

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence...

متن کامل

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximat...

متن کامل

Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their “gradient temporal difference” (GTD) algorithm converges reliably, it can be very slow compared to conventional linear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008